Skip to content

Implement native interleave for ListView#9558

Open
vegarsti wants to merge 5 commits intoapache:mainfrom
vegarsti:list-view-interleave-native
Open

Implement native interleave for ListView#9558
vegarsti wants to merge 5 commits intoapache:mainfrom
vegarsti:list-view-interleave-native

Conversation

@vegarsti
Copy link
Copy Markdown
Contributor

@vegarsti vegarsti commented Mar 15, 2026

This PR adds a native implementation of interleave for the ListView type. Also adds a benchmark.

Performance improves by more than 30% in all cases in the benchmark:

Benchmark Time (branch) Change vs baseline Speedup
list_view<i64>(0.1,0.1,20) 100 2.40 µs −41.2% 1.70×
list_view<i64>(0.1,0.1,20) 400 6.50 µs −36.6% 1.58×
list_view<i64>(0.1,0.1,20) 1024 15.67 µs −38.1% 1.62×
list_view<i64>(0.1,0.1,20) 1024 4-arr 15.76 µs −35.7% 1.55×
list_view<i64>(0.0,0.0,20) 100 1.49 µs −41.0% 1.70×
list_view<i64>(0.0,0.0,20) 400 3.99 µs −30.3% 1.43×
list_view<i64>(0.0,0.0,20) 1024 9.58 µs −40.7% 1.69×
list_view<i64>(0.0,0.0,20) 1024 4-arr 9.45 µs −36.2% 1.57×

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 15, 2026
@vegarsti vegarsti force-pushed the list-view-interleave-native branch from a1131f2 to 16f2287 Compare March 15, 2026 15:42
@brancz
Copy link
Copy Markdown
Contributor

brancz commented Mar 16, 2026

Do you mind comparing this to the fallthrough performance of #9562 ?

@vegarsti
Copy link
Copy Markdown
Contributor Author

Do you mind comparing this to the fallthrough performance of #9562 ?

Oh for sure, thanks for reminding me!

@vegarsti
Copy link
Copy Markdown
Contributor Author

vegarsti commented Mar 16, 2026

Updated the description with results now. It's not looking like a win..!

@brancz
Copy link
Copy Markdown
Contributor

brancz commented Mar 16, 2026

I would say let's merge the fallthrough and iterate on this version. I'm sure there are several possibilities for optimizations.

@asubiotto
Copy link
Copy Markdown
Contributor

FWIW I pushed up the branch I've had marinating locally for a month or two in case it's helpful: main...polarsignals:arrow-rs:asubiotto/lvinterleave. I believe the benchmarks showed a slight regression for interleaves of small lists, but overall the perf was an improvement. I'm not able to take a closer look right now, but sharing in case it's helpful.

@vegarsti
Copy link
Copy Markdown
Contributor Author

FWIW I pushed up the branch I've had marinating locally for a month or two in case it's helpful: main...polarsignals:arrow-rs:asubiotto/lvinterleave. I believe the benchmarks showed a slight regression for interleaves of small lists, but overall the perf was an improvement. I'm not able to take a closer look right now, but sharing in case it's helpful.

Thank you!

@vegarsti vegarsti force-pushed the list-view-interleave-native branch from 6e8412e to b18c3a6 Compare March 19, 2026 12:48
@vegarsti
Copy link
Copy Markdown
Contributor Author

Updated implementation and results now!

@asubiotto
Copy link
Copy Markdown
Contributor

Sorry for dropping the ball on this! I think this is going in the right direction but when I pulled this in to try it out I realized that it doesn't work very well when interleaving listviews with a high number of shraed elements (i.e. offset/size windows are overlapping). I think we can get the best of both worlds by computing a heuristic: i.e. how many values are referenced vs how many values are in the backing array to figure out if we want to do per-row copies as this pr does or just a full concat of the backing slice which preserves overlapping encodings and can be much cheaper in the end. Here is a commit that implements that on top of this PR with a benchmark: polarsignals@7cb6880

There is a slight perf hit vs your branch to compute the heuristic (summing referenced sizes), but I think it's worth it in the grand scheme of things:

interleave list_view<i64>(0.1,0.1,20) 100 [0..100, 100..230, 450..1000]
                        time:   [2.8553 µs 2.8661 µs 2.8777 µs]
                        change: [−39.782% −39.429% −39.058%] (p = 0.00 < 0.05)
interleave list_view<i64>(0.1,0.1,20) 400 [0..100, 100..230, 450..1000]
                        time:   [8.2066 µs 8.2440 µs 8.2838 µs]
                        change: [−41.803% −41.460% −41.123%] (p = 0.00 < 0.05)
interleave list_view<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000]
                        time:   [22.291 µs 22.424 µs 22.580 µs]
                        change: [−39.377% −38.883% −38.328%] (p = 0.00 < 0.05)
interleave list_view<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000, 0..1000]
                        time:   [21.744 µs 21.868 µs 22.003 µs]
                        change: [−40.397% −39.966% −39.515%] (p = 0.00 < 0.05)
interleave list_view<i64>(0.0,0.0,20) 100 [0..100, 100..230, 450..1000]
                        time:   [1.7642 µs 1.7770 µs 1.7937 µs]
                        change: [−36.120% −35.680% −35.219%] (p = 0.00 < 0.05)
interleave list_view<i64>(0.0,0.0,20) 400 [0..100, 100..230, 450..1000]
                        time:   [5.1748 µs 5.2052 µs 5.2392 µs]
                        change: [−29.000% −28.500% −28.000%] (p = 0.00 < 0.05)
interleave list_view<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000]
                        time:   [12.528 µs 12.631 µs 12.741 µs]
                        change: [−29.293% −28.511% −27.801%] (p = 0.00 < 0.05)
interleave list_view<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000, 0..1000]
                        time:   [13.009 µs 13.098 µs 13.192 µs]
                        change: [−26.841% −26.193% −25.550%] (p = 0.00 < 0.05)
interleave list_view_overlapping<i64>(80x,20) 100 [0..100, 100..230, 450..1000]
                        time:   [1.8046 µs 1.8271 µs 1.8547 µs]
                        change: [−44.472% −43.715% −42.935%] (p = 0.00 < 0.05)
interleave list_view_overlapping<i64>(80x,20) 400 [0..100, 100..230, 450..1000]
                        time:   [3.3896 µs 3.4283 µs 3.4689 µs]
                        change: [−66.387% −66.073% −65.773%] (p = 0.00 < 0.05)
interleave list_view_overlapping<i64>(80x,20) 1024 [0..100, 100..230, 450..1000]
                        time:   [5.7748 µs 5.8133 µs 5.8482 µs]
                        change: [−72.104% −71.879% −71.641%] (p = 0.00 < 0.05)
interleave list_view_overlapping<i64>(80x,20) 1024 [0..100, 100..230, 450..1000, 0..1000]
                        time:   [6.2896 µs 6.3539 µs 6.4243 µs]
                        change: [−69.684% −69.377% −69.083%] (p = 0.00 < 0.05)

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 16, 2026

Sorry for dropping the ball on this! I think this is going in the right direction but when I pulled this in to try it out I realized that it doesn't work very well when interleaving listviews with a high number of shraed elements (i.e. offset/size windows are overlapping).

Could you perhaps make a PR that adds this case as a benchmark?

let list_i64_no_nulls =
create_primitive_list_array_with_seed::<i32, Int64Type>(8192, 0.0, 0.0, 20, 42);

let list_view_i64: ListViewArray =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add this benchmark as a separate PR (to make it easier to run the automated benchmark runners)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged!

alamb pushed a commit that referenced this pull request Apr 16, 2026
Ref #9558 (comment)

---------

Co-authored-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add native ListView support for interleave kernel

4 participants